-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
By default, this model are using `bert_12_768_12` model with extra layers for QA jobs. | ||
|
||
After that, to be able to use it in Java, we need to export the dictionary from the script to parse the text | ||
to actual indexes. Please add the following lines after [this line](https://github.com/dmlc/gluon-nlp/blob/master/scripts/bert/staticbert/static_finetune_squad.py#L262). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add an option to this file itself (i.e. create a PR) to export the vocabulary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. @lanking520 did the original exploration and documentation on this. What do you think?
|
||
For this tutorial, you can get the model and vocabulary by running following bash file. This script will use `wget` to download these artifacts from AWS S3. | ||
|
||
From the `scala-package/examples/scripts/infer/bert/` folder run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be within the clojure bert-qa example folder?
:token2idx (get vocab "token_to_idx")})) | ||
|
||
(defn tokens->idxs [token2idx tokens] | ||
(mapv #(get token2idx % (get token2idx "[UNK]")) tokens)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let [unk-idx (get token2idx "[UNK]")] ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better
|
||
(defn get-vocab [] | ||
(let [vocab (json/parse-stream (clojure.java.io/reader "model/vocab.json"))] | ||
{:idx2token (get vocab "idx_to_token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
{:idx->token ...
:token->idx ...}
?
(break-out-punctuation s target-char) | ||
[s])) | ||
|
||
(defn tokenizer [s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokenize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! this is great!! only minor comments/suggestions
Thanks so much for the feedback @kedarbellare - I'll work on implementing it 😸 |
(map #(string/replace % "<punc>" str-match)))) | ||
|
||
(defn break-out-punctuations [s] | ||
(if-let [target-char (first (re-seq #"[.,?!]" s))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do saw some tokens like ...
in your example, maybe get it covered as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was changing the data examples around and I don't have one with that in there now, but I'll keep my eye out for them in the future.
* Initial working example for bert qa * add RAT rename core to infer add integration test * add rat for project.clj * Couldn’t resist adding a qa about Clojure * rat for readme * feedback from @kedarbellare
* Initial working example for bert qa * add RAT rename core to infer add integration test * add rat for project.clj * Couldn’t resist adding a qa about Clojure * rat for readme * feedback from @kedarbellare
Description
Thanks to @lanking520 and the JVM team - we were able to convert the Java BERT QA example to the Clojure package 💯
This makes a slight change to it by having an external
edn
file to store the sample question and answers along with the ground truths to be able to process multiple examples and make it easier to edit and add more.Example output:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
New example code for BERT QA based off the Java example along with a new integration test
Comments